Documentation for the Morphology Lab.
MS, 2004-2007
This is a very minimal documentation. You too can contribute to the NooJ project by writing a real documentation, with exercises, in your own language !
Following are the morphological operators that are available to any language:
<B>: keyboard Backspace
<C>: change Case
<D>: Duplicate current char
<E>: Empty string
<L>: keyboard Left arrow
<N>: go to end of Next word form
<P>: go to end of Previous word form
<R>: keyboard Right arrow
<S>: delete/Suppress current char
The following morphological operators can have an number argument: <B>, <L>, <N>, <P>, <R>, <S>:
xx number: repeat xx times
W: whole word
For example:
<B2>: delete the last two characters
<L3>: go left 3 times
<R4>: go right 4 times
<S5>: delete the next 5 characters
<BW>: delete from the current character all the way to the first character
<LW>: go to beginning of current word form
<RW>: go to the end of the current word form
<SW>: delete all the following characters of the current word form
Language-Specific morphological operators for Catalan, Español, Français, ελληνικά and Português:
<Á>: add acute accent
<À>: add grave accent
<Ä>: add dieresis
<Â>: add circumflex
<A>: remove Accent
Example of a Language-Specific morphological operator for עברית:
<F>: Finalize current letter if not a final letter; unFinalize current letter if letter is final
<G>: insert daGesh if current letter is begadkefat
<H>: insert atef-pataH if current letter is gutural, or shwa
<M>: delete current letter, take dagesh and shin/sin dot into account
There are also specific morphological operators for العربية : <M>, <T> and <Z>.
Commands are suffixes to be added to the lemma in order to produce the inflected form:
cousin [es] => cousines
Morphological operator <B> (“Backspace”) is used to DELETE characters from the lemma:
voler [<B>] => vole
voler [<B>a] => vola
cheval [<B>ux] => chevaux
recordman [<B3>women] => recordwomen
One can delete all the letters of the lemma in order to build an irregular inflected form:
avoir [<B5>ont] => ont
avoir [<BW>ont] => ont
More complex commands are used in order to make operations as general as possible. For instance, the three following verbs inflect with the same command, hence, they can be associated with the same inflectional paradigm:
lever [<L3><B>è<R2><S>nt] => lèvent
mener [<L3><B>è<R2><S>nt] => mènent
semer [<L3><B>è<R2><S>nt] => sèment
The <N> and <P> commands are used to inflect Multi-Word Units (or “compounds”). <N> moves the cursor to the end of the next component of the MWU; <P> moves the cursor to the end of the previous component:
cousin germain [e<P>e] => cousine germaine
cousin germain [s<P>s] => cousins germains
cousin germain [es<P>es] => cousines germaines